Recently, census results in the US and in China indicate that both countries are likely to start shrinking in population much sooner than they thought.[1]
The sustainable development of a country is inseparable from a steady stream of young people. According to my social life experience, I think from a macro perspective, whether people are willing to have more children largely depends on the overall social and economic development level[2], relevant welfare policies and regional culture. But from a micro perspective, I think there are some dominant (such as household income,…) or recessive factors (living environment, education level of parents,…)[3]that will also affect the willingness of people. I want to explore and uncover this relationship.
Before I bring up some questions, I will define a variable/ factor that I want to explore. I am not sure about what this variable should be at this time. But I think the variable can represent some parents’ (who are in the age of giving birth to a child) willing to give birth to a child at a certain time. So, the variable can be the number of kids in a household who are younger than some ages. I just call the variable the birth rate here.
With the variable identified, I think I can explore several questions, > How different factors (household income, education level of parents, parents’ working hours, environmental factors) affect the birth rate? (regression) > For people of different races or ethnicities, will the effect of the above factors change? > If the factors above can only explain a part of variation of birth rate, what kind of data or skills do we need to import to increase the regression accuracy. For example, how to quantify some policies to encourage fertility. [Optional]
I will use the Public Use Microdata Sample (PUMS) and CalEnviroScreen as my base dataset. From PUMS, I can find the related variables to identify the birthrate and also some factors affecting the birth rate. From CalEnviroScreen, I can find some environmental factors.
[1] https://www.bbc.com/news/world-57112631 [2] Yin, Xiao‑Cui, and Chi-Wei Su. “House Prices and China’s Birth Rate: A Note.” Asian Economics Letters 2.2 (2021): 22334. [3] Kearney, Melissa Schettini, Phillip B. Levine, and Luke W. Pardue. The Puzzle of Falling US Birth Rates Since the Great Recession. No. w29286. National Bureau of Economic Research, 2021.
At this stage, I mainly focus on ACS data (2019/acs/acs5). I found two different variables that can reflect the birth rate. They are “The proportion of young children (under 5) in Bay Area” (group(B01001)) and Fertility rate in the past 1 year(B13002_001E). Two multiple regression models are built to explore the significant influencing factors.
The average percentage of young children (under 5) in the Bay Area from 2015 to 2019 is 5.53%. From the chart we can see that this ratio is basically evenly distributed geographically.
Explore related features
Median household income in the past 12 months) B19013_001E: Estimate!!Median household income in the past 12 months (in 2019 inflation-adjusted dollars)
Percentage of household income over $100,000 B19001_001E, B19001_014E, B19001_015E, B19001_016E, B19001_017E
Individual income in the past 12 months B06010_001E
Percentage of low household income in the past 12 months (only for householder 25~44)
Here, low household income means less than $30,000
Percentage of below 100 percent of poverty level in the past 12 months (only for women 15~50)
B15003 The percentage of population 25 years and over with college degrees (or more)B11005 HOUSEHOLDS BY PRESENCE OF PEOPLE UNDER 18 YEARS BY HOUSEHOLD TYPE perc_m: The percentage of Married-couple familyB23010 PRESENCE OF OWN CHILDREN UNDER 18 YEARS IN MARRIED-COUPLE FAMILIES BY WORK EXPERIENCE OF HOUSEHOLDER AND SPOUSE household_full: The percentage of householder worked full-time, year-round in the past 12 months household_less_full: The percentage of householder worked less than full-time, year-round in the past 12 months household_no_work: The percentage of householder did not work, year-round in the past 12 months Features in CES 4.0 data Features: CES 4.0 Score, PM2.5,Education, Poverty, Housing Burden, Unemployment
PM2.5 is a factor to reflect the environmental effect (may be less important)
Concatenate all features (from ACS, calenviroscreen40)
Conduct a correlation analysis
##
## Call:
## lm(formula = percent_young ~ median_income + perc_over100k +
## perc_college + indi_income + perc_low_income_25_44 + perc_poverty +
## perc_m + household_full + household_less_full + household_no_work +
## `CES 4.0 Score` + PM2.5 + Education + Poverty + `Housing Burden` +
## Unemployment, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.1171 -1.4145 -0.1608 1.2554 12.1716
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -7.633e-02 1.100e+00 -0.069 0.94470
## median_income 3.716e-06 3.804e-06 0.977 0.32868
## perc_over100k -1.072e-02 1.163e-02 -0.922 0.35668
## perc_college -1.790e+00 6.701e-01 -2.672 0.00762 **
## indi_income -9.032e-05 3.660e-05 -2.468 0.01371 *
## perc_low_income_25_44 1.826e+00 8.972e-01 2.035 0.04206 *
## perc_poverty -1.629e-02 1.101e-02 -1.480 0.13914
## perc_m 3.096e+00 5.987e-01 5.172 2.64e-07 ***
## household_full 7.084e+00 6.691e-01 10.588 < 2e-16 ***
## household_less_full 5.890e+00 9.361e-01 6.293 4.11e-10 ***
## household_no_work NA NA NA NA
## `CES 4.0 Score` 1.648e-02 8.379e-03 1.967 0.04942 *
## PM2.5 -3.087e-02 9.944e-02 -0.310 0.75630
## Education -5.411e-03 1.131e-02 -0.478 0.63248
## Poverty 3.208e-02 1.234e-02 2.600 0.00941 **
## `Housing Burden` -1.414e-02 1.234e-02 -1.146 0.25191
## Unemployment -1.332e-02 2.436e-02 -0.547 0.58469
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.058 on 1476 degrees of freedom
## Multiple R-squared: 0.1562, Adjusted R-squared: 0.1476
## F-statistic: 18.22 on 15 and 1476 DF, p-value: < 2.2e-16
From the regression result, we can find that the education attainment (perc_college, The percentage of population 25 years and over with college degrees (or more) and poverty are relatively significant for the proportion of young children (under 5) in Bay Area (2015-2019). High education attainment and lower poverty will cause lower proportion of young children. This matches our common sense. Marital status(perc_m) and work experience / employment (household_full, household_less_full) are very significant for the proportion of young children (under 5) in Bay Area (2015-2019). In terms of probability, a Married-coupled family will increase the proportion of young children and a full-time work householder will also increase the proportion of young children.
WOMEN 15 TO 50 YEARS WHO HAD A BIRTH IN THE PAST 12 MONTHS
B13002_001E, B13002_002E
Compare the chart to the last chart (fertility rate), we can find that they can be overlapped in some parts.
Extra Discovery
This is an extra discovery. For different age, the proportion of marital status of women who had a birth recently is different.
##
## Call:
## lm(formula = perc_fertility ~ median_income + perc_over100k +
## perc_college + indi_income + perc_low_income_25_44 + perc_poverty +
## perc_m + household_full + household_less_full + household_no_work +
## `CES 4.0 Score` + PM2.5 + Education + Poverty + `Housing Burden` +
## Unemployment, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.4874 -2.1479 -0.3774 1.6020 13.0595
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.515e+00 1.539e+00 1.635 0.10236
## median_income -3.142e-06 5.319e-06 -0.591 0.55480
## perc_over100k 5.861e-03 1.626e-02 0.360 0.71855
## perc_college -1.091e+00 9.371e-01 -1.164 0.24443
## indi_income -1.237e-04 5.119e-05 -2.416 0.01582 *
## perc_low_income_25_44 2.237e+00 1.255e+00 1.783 0.07485 .
## perc_poverty -1.352e-02 1.539e-02 -0.878 0.37998
## perc_m 2.492e+00 8.373e-01 2.976 0.00296 **
## household_full 2.781e+00 9.357e-01 2.972 0.00301 **
## household_less_full 3.432e+00 1.309e+00 2.622 0.00884 **
## household_no_work NA NA NA NA
## `CES 4.0 Score` 1.897e-02 1.172e-02 1.619 0.10562
## PM2.5 -9.495e-02 1.391e-01 -0.683 0.49484
## Education 2.039e-02 1.582e-02 1.289 0.19759
## Poverty -4.283e-03 1.726e-02 -0.248 0.80401
## `Housing Burden` -2.832e-03 1.726e-02 -0.164 0.86966
## Unemployment 1.604e-03 3.406e-02 0.047 0.96245
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.878 on 1476 degrees of freedom
## Multiple R-squared: 0.04943, Adjusted R-squared: 0.03977
## F-statistic: 5.116 on 15 and 1476 DF, p-value: 5.332e-10
Similar regression results are shown in this part. From the results we can see that income (indi_income) and marital status/ household type (perc_m) and work experience / employment (household_full, household_less_full ) are relatively significant (0.01) for fertility rate.
According to the result of ACS data, some factors like income, employment, marital status are essential to the fertility rate. So, I analyzed the Public Use Microdata Sample (5 years, 2019). The following section is a regression analysis at the household level based on PUMS.
Several factors in PUMS are selected to conduct the regression.
FPARC ~ PUMA + TEN + HHT + HINCP + FES + HHL
In the above regression equation, FPARC means ‘Family presence and age of related children’. This variable is converted to 0-1 variable where FPARC = 1 means there are at least one related child under 5 years in the household. PUMA means public use microdata area code. It is a factor explanatory variable. TEN means tenure. HINCP means household income (past 12 months, use ADJINC to adjust HINCP to constant dollars). TEN and HINCP can reflect the features of the household’s can reflect the basic economic situation of a family. HHT means household/family type. FES means family type and employment status. HHL means household language. This factor can reflect the difference of fertility in different races to a certain extent when other variables are constant. All variables mentioned before are factor explanatory variables except HINCP which is a real number.
Fertility regression model based on PUMS
##
## Call:
## glm(formula = FPARC ~ PUMA + TEN + HHT + HINCP + FES + HHL, family = quasibinomial(),
## data = pums_fert1)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.3801 -0.6508 -0.5168 -0.2272 3.1405
##
## Coefficients: (2 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.034e+00 7.279e-02 -27.949 < 2e-16 ***
## PUMA102 1.332e-01 9.409e-02 1.416 0.156793
## PUMA103 2.116e-01 9.446e-02 2.240 0.025081 *
## PUMA104 5.216e-01 9.595e-02 5.436 5.47e-08 ***
## PUMA105 2.199e-01 9.223e-02 2.384 0.017116 *
## PUMA106 1.416e-01 9.702e-02 1.460 0.144285
## PUMA107 2.307e-01 9.203e-02 2.507 0.012194 *
## PUMA108 1.854e-01 9.097e-02 2.038 0.041530 *
## PUMA109 6.815e-02 8.585e-02 0.794 0.427343
## PUMA110 5.995e-02 8.470e-02 0.708 0.479024
## PUMA1301 3.137e-01 1.014e-01 3.094 0.001974 **
## PUMA1302 -9.610e-02 1.016e-01 -0.946 0.344306
## PUMA1303 3.369e-01 9.891e-02 3.406 0.000659 ***
## PUMA1304 -3.741e-01 1.109e-01 -3.374 0.000741 ***
## PUMA1305 -1.148e-01 9.743e-02 -1.178 0.238704
## PUMA1306 3.317e-02 1.033e-01 0.321 0.748119
## PUMA1307 4.153e-01 9.796e-02 4.240 2.24e-05 ***
## PUMA1308 2.986e-01 1.086e-01 2.749 0.005976 **
## PUMA1309 3.492e-01 1.047e-01 3.335 0.000852 ***
## PUMA4101 -7.807e-02 1.132e-01 -0.690 0.490223
## PUMA4102 -1.209e-01 9.888e-02 -1.223 0.221351
## PUMA5500 -1.856e-04 9.497e-02 -0.002 0.998440
## PUMA7501 1.583e-02 1.028e-01 0.154 0.877684
## PUMA7502 -4.274e-01 1.217e-01 -3.511 0.000447 ***
## PUMA7503 -9.410e-02 1.131e-01 -0.832 0.405619
## PUMA7504 1.534e-01 1.077e-01 1.424 0.154440
## PUMA7505 7.289e-02 1.091e-01 0.668 0.503996
## PUMA7506 5.311e-02 1.082e-01 0.491 0.623385
## PUMA7507 8.498e-02 1.067e-01 0.797 0.425668
## PUMA8101 6.354e-02 1.003e-01 0.634 0.526406
## PUMA8102 -4.815e-02 1.062e-01 -0.453 0.650248
## PUMA8103 9.918e-03 1.009e-01 0.098 0.921677
## PUMA8104 4.194e-02 9.371e-02 0.448 0.654467
## PUMA8105 7.604e-02 9.319e-02 0.816 0.414499
## PUMA8106 1.377e-01 9.803e-02 1.404 0.160224
## PUMA8501 -1.502e-01 8.803e-02 -1.706 0.088064 .
## PUMA8502 2.223e-01 8.932e-02 2.489 0.012819 *
## PUMA8503 7.508e-02 9.335e-02 0.804 0.421222
## PUMA8504 1.801e-01 9.416e-02 1.913 0.055756 .
## PUMA8505 5.867e-02 1.018e-01 0.577 0.564263
## PUMA8506 5.078e-02 1.013e-01 0.501 0.616272
## PUMA8507 -4.483e-01 9.917e-02 -4.520 6.18e-06 ***
## PUMA8508 -1.383e-02 9.639e-02 -0.143 0.885943
## PUMA8509 1.059e-01 1.011e-01 1.047 0.294968
## PUMA8510 2.524e-01 9.292e-02 2.716 0.006609 **
## PUMA8511 1.978e-01 9.965e-02 1.985 0.047104 *
## PUMA8512 -7.980e-02 9.796e-02 -0.815 0.415278
## PUMA8513 -8.062e-02 1.045e-01 -0.771 0.440579
## PUMA8514 2.811e-01 9.930e-02 2.830 0.004650 **
## PUMA9501 1.581e-01 9.635e-02 1.641 0.100808
## PUMA9502 4.052e-01 9.288e-02 4.362 1.29e-05 ***
## PUMA9503 1.156e-01 9.992e-02 1.157 0.247128
## PUMA9701 -7.947e-02 9.506e-02 -0.836 0.403148
## PUMA9702 8.933e-02 1.013e-01 0.882 0.377941
## PUMA9703 1.551e-01 9.568e-02 1.622 0.104903
## TEN2 -6.130e-01 3.500e-02 -17.515 < 2e-16 ***
## TEN3 5.716e-01 2.141e-02 26.697 < 2e-16 ***
## TEN4 2.585e-01 9.660e-02 2.676 0.007456 **
## HHT2 -9.175e-01 1.103e-01 -8.319 < 2e-16 ***
## HHT3 -4.998e-01 5.254e-02 -9.513 < 2e-16 ***
## HINCP 8.053e-07 6.363e-08 12.656 < 2e-16 ***
## FES2 3.035e-01 2.294e-02 13.227 < 2e-16 ***
## FES3 -9.736e-01 5.360e-02 -18.165 < 2e-16 ***
## FES4 -1.943e+00 6.403e-02 -30.346 < 2e-16 ***
## FES5 4.881e-01 1.167e-01 4.182 2.90e-05 ***
## FES6 NA NA NA NA
## FES7 6.984e-02 5.820e-02 1.200 0.230193
## FES8 NA NA NA NA
## HHL2 5.023e-01 2.662e-02 18.867 < 2e-16 ***
## HHL3 4.097e-01 3.113e-02 13.160 < 2e-16 ***
## HHL4 2.507e-01 2.548e-02 9.839 < 2e-16 ***
## HHL5 5.032e-01 6.933e-02 7.258 3.95e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for quasibinomial family taken to be 0.98315)
##
## Null deviance: 80983 on 93023 degrees of freedom
## Residual deviance: 74379 on 92954 degrees of freedom
## AIC: NA
##
## Number of Fisher Scoring iterations: 6
From the results of regression we can see that almost all the variables we select are significantly correlated with fertility rate / child-rearing rate statistically. In terms of PUMAs, some places are significantly more child-rearing than others, such as PUMA104, PUMA1303, PUMA1307, PUMA1309 and PUMA9502. However, some places are are significantly less child-rearing than others, such as PUMA1304, PUMA7502, PUMA8507. In terms of TEN, household who rented a house or occupied without payment of rent will be more child-rearing than others significantly. HINCP(household income) is same. In terms of HHT, we can find that married couple household is more child-rearing than others significantly. In terms of FES, we can find in married-couple family, there will be more child-rearing when husband in labor force, wife not in LF (FES2>0) than the case when husband not in LF, wife in LF (FES3<0). From this we can deduce that the employment status of husband seems more important for a child-rearing. This is not a good phenomenon since this is probably due to the fact that men generally earn more than women. In terms of HHL, it is also a significant variable. We can find an interesting thing that households who speaks English only or other languages are more child-rearing than others.